Stable Topic Modeling with Local Density Regularization
نویسندگان
چکیده
Topic modeling has emerged over the last decade as a powerful tool for analyzing large text corpora, including Web-based usergenerated texts. Topic stability, however, remains a concern: topic models have a very complex optimization landscape with many local maxima, and even different runs of the same model yield very different topics. Aiming to add stability to topic modeling, we propose an approach to topic modeling based on local density regularization, where words in a local context window of a given word have higher probabilities to get the same topic as that word. We compare several models with local density regularizers and show how they can improve topic stability while remaining on par with classical models in terms of quality metrics.
منابع مشابه
Semantic Visualization with Neighborhood Graph Regularization
Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consi...
متن کاملAdditive Regularization of Topic Models for Topic Selection and Sparse Factorization
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. Determining the optimal number of topics remains a challenging problem in topic modeling. We propose a simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM), a multicriteria approach for combining regularizers. The entropy regularization gradu...
متن کاملTutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.
متن کاملA simple Analytical model for solidification cooling rate based on the local heat flux density
A new simple analytical model for prediction of cooling rate in the solidification process based on the local heat flux density extracted during solidification is introduced. In the modeling procedure, a solidifying control volume is considered in the mushy zone in which a heat balance equation is used to derive the present model. As the local heat flux density is a measurable parameter, the pr...
متن کاملManifold Learning for Semantic Visualization
Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the size of the vocabulary. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions, using techniques...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016